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Abstract 

In this paper, a hard thresholding wavelet estimator is constructed for a deconvolution model 
in a periodic setting that has long-range dependent noise. The estimation paradigm is based 
on a maxiset method that attains a near optimal rate of convergence for a variety of 5£ v 
loss functions and a wide variety of Besov spaces in the presence of strong dependence. The 
effect of long-range dependence is detrimental to the rate of convergence. The method is 
implemented using a modification of the WaveD-package in R and an extensive numerical 
study is conducted. The numerical study supplements the theoretical results and compares 
the LRD estimator with naively using the standard WaveD approach. 
Keywords: Besov Spaces, Deconvolution, fractional Brownian motion, Long-Range 
Dependence, Maxiset theory, Wavelet Analysis 
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1. Introduction 

Nonparametric estimation of a function in a deconvolution model has been studied widely 
in various contexts. We study the deconvolution model with a long range dependent (LRD) 
error structure. More specifically, we consider the problem of estimating a function / after 
observing the process, 

dY(x) = K * f(x)dx + e a dB H (x), a: 6 [0,1]; (1) 

where K * f(x) = K(t)f(x — t)dt is the regular convolution operator, e x n -1 / 2 , Bu 
is a fractional Brownian motion. The fractional Brownian motion is defined as a Gaussian 
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s process with zero mean and covariance structure, 

EB H (t)B H (s) = ~(\s\ 2H +\t\™-\t-s\* H ) 

9 and a = 2 — 2H G (0, 1] is the level of long-range dependence (where H denotes the standard 

10 Hurst parameter). The assumption of an i.i.d. error structure is captured as a special case 

11 of (1) with the choice a = 1 which reduces the model to a standard Brownian motion 

12 structure. The convolution operator K is assumed to be of the regular-smooth type such 

13 that the Fourier coefficients 

\K[u}\-\u\' u for all wel, (2) 

14 where v > and K[u] denotes the Fourier transform K[u] = J-K[w] '■— j R e~ 27TluJX K(x) dx. 

15 Deconvolution is a common problem occuring in several areas such as econometrics, 

16 biometrics, medical statistics and image reconstruction. For example, the method can be 

17 applied to the light detection and ranging (LIDAR) techniques and image de-blurring tech- 
ib niques. The parameter v > is often referred to as the degree of ill-posedness (DIP) with 

19 v — denoting the well-posed or direct case. 

20 Various wavelet methods have been constructed to address the deconvolution problem 

21 over the last two decades (see for example Donoho (1995); Wang (1997); Abramovich and Silverman 

22 (1998); Walter and Shen (1999); Fan and Koo (2002); Donoho and Raimondo (2004); Johnstone and Rail 

23 (2004); Johnstone et al. (2004); Kalifa and Mallat (2003); Pensky and Sapatinas (2009)). 

24 In the standard deconvolution models, the assumption of i.i.d. variables is made. How- 

25 ever, empirical evidence has shown that even at large lags, the correlation structure in 

26 variables can decay at a hyperbolic rate. To account for this, an extensive literature on LRD 

27 variables has emerged to describe this phenomena. Areas of applications of LRD analysis 
2B include economics with financial returns, volatility and stock trading volumes; hydrology in 

29 rainfall and temperature data; and computer science with data network traffic data. There 

30 are many more applications of LRD analysis and the interested reader is referred to Beran 

31 (1992, 1994) and Doukhan et al. (2003) for more details. Some analysis has been done for the 

32 direct model {y = 0) with LRD errors in works such as Wang (1996); Kulik and Raimondo 

33 (2009b). The topic of density deconvolution with LRD has been studied by Kulik (2008); 

34 Chesneau (2012). 

35 The aim of this paper is to study a wavelet deconvolution algorithm that can be easily 

36 applied in the context of a deconvolution problem with LRD errors as in model (1). Minimax 
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37 rates of estimation of / have been established in our context by Wang (1997) for the squared- 

3B error loss. However, their method uses the Wavelet- Vaguelette Decomposition (WVD) which 

39 is a sophisticated transform where, to the authors knowledge, there is no freely available 

40 software for implementation. 

41 There are two main contributions that this paper will address. The first contribution 

42 will establish theoretical results for a wide variety of function classes over many error mea- 

43 sures (which includes the squared-error loss considered in Wang (1997) as a special case) by 

44 adapting the approaches of Johnstone et al. (2004) and Kulik and Raimondo (2009a). The 

45 estimation of / can be achieved with an accuracy of order, 

f\ogn\ p asp 

V n ) ' 9 ~ 2s + 2z/ + a' 

46 where the performance is measured by the error loss from the 5£ v metric. The approach 

47 of Johnstone et al. (2004) used a hard thresholding wavelet estimator. We modify their 

48 approach by determining the appropriate threshold levels and fine scale level under the 

49 strong dependence structure in (1). 

50 The second contribution is allowing an easily implementable method for estimation in 

51 practice by modifying the already established WaveD approach of Raimondo and Stewart 

52 (2007). The WaveD software is freely available from CRAN (http://cran.r-project.org/). 

53 With our modification of WaveD, a numerical study is conducted comparing the performance 

54 of the default WaveD method and the LRD method presented here. Four popular test case 

55 signals are used to benchmark methods in the literature are the Doppler, LIDAR, Bumps 

56 and Cusp signals. These are used here and are shown in Figure 1. 

57 As will become evident in the later in the theoretical analysis. The case of LRD errors 

58 introduces some challenges to the estimation. For strong dependence there can be artificial 

59 trends in the noise which require modified thresholds in the wavelet estimator compared to 
eo the i.i.d. case. These effects are demonstrated clearly in Figure 2 for the cusp signal whereby 

61 the addition of a higher scale in the wavelet estimation deteriorates the performance of the 

62 standard WaveD method when there is a strong LRD error structure. Since the finest scale is 

63 too large, the i.i.d. method also includes the spurious trends which were artificially generated 

64 by the LRD error sequence. 

65 1.1. Outline 

ee A review of periodised Meyer wavelets and the Besov functional class are given in 

67 Section 2. In Section 3 the basic argument for the deconvolution technique is given along 
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Figure 1: Four signals that are common in signal processing and deconvolution. 

as with the convergence rate results. In Section 4, the method is implemented in R and a nu- 

69 merical study is conducted to confirm the rate results and a compare with the current WaveD 

70 methodology. The mathematical proofs are given in Section 5. 

71 2. Preliminary framework 

72 2.1. Periodised Meyer wavelet basis 

73 Let (4>, ip) denote the Meyer wavelet scaling and detail basis functions defined on the real 

74 line M; (see Meyer (1992) and Mallat (1999)). These are defined in the Fourier domain with, 

1, if \u\ < 1/3 

4>{u) = { cos(7r/2t;(3M - 1)), if \u\ G (1/3, 2/3] 
0, otherwise. 

75 and the mother wavelet function defined with, 



e -i™ 5^(1^(31^1 _ if | w | e (1/3, 2/3] 

e -i™ cos (| v(3/2\u)\ - 1)), if \u\ G (2/3, 4/3] 
0, otherwise. 



(3) 
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Figure 2: Comparison of the LRD WaveD method against the default WaveD method. In both 
cases, n = 4096 = 2 12 . The LRD method truncates the finest scale earlier as noted. For the 
LIDAR and cusp signals, SNR w 20dB. 



The auxiliary function v is a piecewise polynomial that can be chosen to ensure that the 
Meyer wavelet has enough vanishing moments. For the bivariate index (j, k), the dilated and 
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7B translated mother and father wavelets at resolution level j and time position k are defined, 

<f> jlk {x) = 2 j/2 <p(2 j x - k), i/jj,k{x) = 2 j/2 ip(2 j x - k), for j > 0. 

79 For our purposes, we are interested with a multiresolution analysis for periodic functions on 
so =£?2([0, 1]). This is done by periodising the Meyer basis functions with, 

si Consequently, any 1— periodic function / £ =^([0, 1]) can be written with a wavlet expansion 

82 with, 

2*0-1 oo 2J-1 

k=j j= jo k=0 

83 where a j0:k = (/, $j )fc ) = f(t)$ j0tk (t) dt and /3 iifc = (/, ^ fc ) = JjJ f(t)^ j>k (t) dt. 

84 This particular expansion is used since the Meyer wavelet is bandlimited (see (3)) which 
as allows the use of efficient algorithms with Raimondo and Stewart (2007) and for mathemat- 
se ical convenience in the later proofs. 

87 2.2. Functional Class 

as We analyse the estimation procedure over a Besov space of periodic functions which have 

89 a nice characterisation using the coefficients of a wavelet expansion (granted that the wavelet 

90 is periodic and has enough smoothness and vanishing moments). 

91 Definition 1. For f £ S? n ([0, 1]) with it > 1 and r > 1, 

r_ 

(23 \ TT 

£i&*r <o °- 
k=0 J 

92 The consideration of a Besov class allows a more precise analysis of the asymptotic conver- 

93 gence results of the signal / since the Besov class includes other common functional classes 

94 as a special case. Loosely speaking, a function / £ B^ r includes functions that are s times 

95 different iable with £ JS^(0, 1). The parameter r is less important, it allows a more 

96 intricate variation of behaviour in the Besov class. The special cases when n or r = oo 

97 involve replacing the £ n and £ r type norms in Definition 1 with the £oo norm. For exam- 

98 pie, if / £ B^ => sup J>0 2 j(s+1 / 2 ~ 1 / 7r ) sup^) \(3j :k \ < oo. The interested reader is referred 

99 to Donoho et al. (1995); Donoho and Johnstone (1998) and references therein for a more 

100 detailed discussion. 
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ioi 3. Function estimation method 



102 We use the wavelet shrinkage paradigm for our estimation which has become a standard 

103 statistical procedure in nonparametric estimation. We use a hard thresholding approach 

104 where the wavelet estimator is defined, 

fc=o (j,fc)eA„ 

105 where a JOi fc and /3 JOi fc are the estimated wavelet coefficients. The level jo corresponds to the 
loo coarse resolution level and the set of indices A n = {(j,k) : jo < j < j\] k = 0, 1, . . . , 2 J ' — 1} 
107 indexes details of the function up to a fine resolution level j\. The wavelet estimation 



> Xj, a scale dependent 



los procedure keeps only the coefficients in the expansion when 

loo threshold. The parameters Xj and indices A n are chosen to control the noise embedded in 

no the estimated wavelet coefficients in an optimal way in the sense of the rate results presented 

in in the next section. 

n2 For the deconvolution problem it is natural to conduct analysis in the Fourier domain 

n3 since the deconvolution operator becomes a multiplier in the Fourier domain. The deconvo- 

H4 lution model, (1), has Fourier domain representation with, 

Y[n] := I e~ 27rinx dY(x) = K[n]f[n]+e a Z H [n]. 
Jr 

Then using the Parseval identity, one can obtain a representation of the wavelet coefficients. 

Y[n 



Z H [n] 



= ^ + ^E^T^N (5) 

115 The Meyer wavelet is bandlimited (see (3)) meaning that the sums are finite. In particular, 
lie define the summation sets for the detail coefficients at scale level j with, 

3 J :={z = ±a:ae{ [2^3] , [2^3] + 1 [2^ 2 /3\ } } , (6) 

in which has cardinality |Oj| = 2 J+1 . A similar procedure is conducted to estimate the scale 
us coefficients Oj 0) fc by using the Fourier coefficients of the scale function $ JO ,jfc instead of the 
no detail function ^j t k- 
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120 3.1. Maxiset Approach 

121 The methodology used here is an amalgamation of the deconvolution work of Johnstone et al. 

122 (2004) and the LRD work of Kulik and Raimondo (2009b). The nonlinear wavelet estimator 

123 (4) can be analysed using the maxiset approach with the level dependent threshold A = Aj 

124 and fine resolution level j\ which depends on a and v. 

125 Fine resolution level. The range of resolution levels (frequencies) where the estimator (4) 

126 is considered is, 

An = {{j,k) : j < j < ji,0 < k < 2 J } . (7) 

127 The finest resolution level j\ is set to be, 

/ n a \ 1 /(«+ 2 ^) 

2* = 2^ := , (8) 

\\ognJ 

128 Then consider the precise form of the level dependent thresholds in A = Aj. As in previous 



129 works, this level dependent threshold will have three input parameters, written, 

Aj ^0~j,v,aCm (9) 

130 where the three values are given by, 

131 • £ : a constant that depends on the tail of the noise distribution. Theoretically this 



should satisfy the bound £ > 2yf a(p V 2). 

of the level-dependent scaling factor that is based on the convolution kernel and level 
of dependence, 

a, = a^ a = 0(2^ a ~ 2 ^ 2 ). (10) 
c n : a sample size dependent scaling factor, 



c n = y/n a \ogn. (11) 

136 The smoothing parameter is intentionally denoted £ to distinguish it clearly from the smooth- 

137 ing parameter, denoted t] used in the i.i.d. case in Johnstone et al. (2004) and its WaveD 

138 implementation in Raimondo and Stewart (2007). 

139 Theorem 1. Consider model (1) with the wavelet estimator (4) with the coefficient esti- 
Mo mates in (5) using the thresholds and resolution levels given by, (7), (8), (9), (10) and (11). 
mi Assume that p > 1 and f 6 B^ r with tt > 1, s > 1/tt where r satisfies 

{2v + a)p (2u + a)p-2' 



< r < min 



2s + 2u + a 2s + 2v -^a 

7T 
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142 Then there exists a constant C > such that, 

logn 



E 



< C 
v \ n 



where ||-|| is the standard p-norm and, 



if s>(2u + a)U- -~); (12) 



2s + 2u + a ~ y ' \2tx 2 

M0 1 P 2 i ^ --v--<s<(2u + a) (£-- . (13) 
2s + 2i/ + a — - 7r 2 \2n 2 J 

143 Remark 1. There is an 'elbow effect 7 or 'phase transition' in the rates of convergence switch- 

144 ing from (12) to (13) which are usually referred to as the 'dense' and 'sparse' phases respec- 

145 tively. This is similar to the effect seen in Kulik and Raimondo (2009a) and Johnstone et al. 
we (2004)- The fBm errors has increased the size of the dense region in comparison to the stan- 
147 dard Brownian motion when the boundary was at s = (2v+l)(p/27r— 1/2) (see Johnstone et al. 
us (2004))- Also for the sparse region, the condition p > 2/ {2v + a) is needed to ensure that it 

149 is well defined. When 2v + a > 2, this is not a restriction since it is assumed that p > 1. 

150 Remark 2. The rate results are consistent with works on LRD and inverse problems in 

151 the literature in the sense that the rate of convergence deteriorates as a decreases (stronger 

152 dependence) or the DIP parameter v increases (more ill-posed). In particular, our rate results 

153 agree with the results obtained in Johnstone et al. (2004) i> n the i.i.d. deconvolution setting 

154 with smooth convolution, \K(u)\ ~ with the choice a = 1. Our rate results also agree 

155 with the results obtained in Kulik and Raimondo (2009a) with the direct regression model 
we with LRD errors with the choice v = 0. The results agree with minimax results the for the 
157 squared-error loss (p = 2) by Wang (1997). 

i5s Remark 3. Our estimator is adaptive with respect to the smoothness parameter s since the 

159 tuning paradigm does not depend on s. However it is not adaptive with respect to the LRD 

wo parameter a since the level dependent thresholds, (8), (10) and (11) depend on a. 



lei 4. Numerical Study 

162 A numerical study is now considered with the focus being the effect of the dependence 

163 structure. Ideally, it would be desirable to compare the LRD WaveD method introduced 

164 here with the minimax optimal WVD method considered by Wang (1997). However, to the 

165 authors knowledge, no freely available implementation of the WVD method exists. 
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166 One of the immediate challenges of implementing the LRD approach is that a is unknown 

167 in practice. The estimation of a is challenging problem in its own right that has been studied 

168 in the literature, see for example Veitch and Abry (1999) and more recently Park and Park 

169 (2009) in this regard. Having full data driven estimates of all the parameters is beyond the 

170 scope of this work and we will assume that a is known. For our purposes we wish to compare 

171 the performance of the regular WaveD approach which was designed for i.i.d. observations 

172 with the LRD extension suggested here. 

173 The default WaveD method uses a stopping rule in the Fourier domain using the results of 

174 Cavalier and Raimondo (2007) to estimate the finest permissible scale level in the expansion. 

175 This concerns the case of model (1) with a standard Brownian motion (a = 1) where 

176 2 jl = 2 J ' 1 > 1 = (to/ logn) 1 /( 1+2 ^. For a fair comparison, the finest permissible scale level, j a i, 

177 should be estimated in the same vein for our LRD extension. The Fourier stopping rule is 
us extended to the LRD framework below. 

179 Briefly reviewing the fine scale estimation method of Cavalier and Raimondo (2007), the 

wo scenario of an unknown convolution kernel K is considered. To alleviate this they assume 

181 that it is possible to choose input signals / in (1) to gain information about K. In particular, 

182 the Fourier basis is chosen / = (ee) e&z and passed into the deconvolution model. Doing 

183 so here, one would pass the Fourier basis into (1) and denote this new information with 

184 dY e (x) = K * eg(x) + e a dB H (x). Due to the orthogonality of the Fourier basis, the Fourier 

185 domain representation of Y e is 

Y e [£] = K[£]+e a W H [£] 

186 where Wh[£] is a complex Gaussian random variable identically distributed to Zjj[£] but 

187 independent of Zjj[£\. We then estimate the fine scale level by, 

? a ,i= Llog 2 Mj -1 (14) 

188 where the stopping time M is determined in the Fourier domain with, 

M = min j^, £ > : \Y e [£]\ < £ a/2 e a log 1/e 2 } , 

189 and L^J is the largest integer small than x. The estimate j a> \ is close to j a> i with high 
wo probability due to Lemma 1 in Section 5. 

191 J^.l. Implementation "procedure 

192 We are now in a position to be able to implement an estimation procedure for the LRD 

193 model (1) using a modification of WaveD method of Raimondo and Stewart (2007). This is 
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194 conducted using the test functions shown in Figure 1. A discretely sampled deconvolution 

195 model is repeatedly simulated with these test signals and the WaveD and LRD modification 
we estimates computed. The performance measure is the Mean-Square Error (MSE) which is 
197 calculated empirically over M = 1024 repeated simulations. 



we 1. Choose f{t) to be a LIDAR, Doppler, Bumps or Cusp signal (see Figure 1) over the 

199 grid ti = i/n with i = 1,2, ... ,n and n = 2 12 = 4096. The LIDAR and Doppler signals 

200 were generated from by the code from the WaveD package and the Bumps and Cusp 

201 signal code was imported into R from the WaveLab package in MATLAB. The signals were 

202 standardised to agree with the signal levels in Cavalier and Raimondo (2007). 

203 2. Simulate the LRD error process using the f racdif f package of R available from CRAN. 

204 Use the f racdif f .sim command to simulate a FARIMA sequence, the parameters 

205 kept at their defaults with the dependence controlled with d = (1 — a)/2 and the LRD 

206 error process was standardised to have unit variance. 

207 3. Generate the convolution kernel, K = k, to be the Gamma density with scale parameter 

208 0.25 and shape parameter 0.7 (The DIP in this case is v = 0.7). 

209 4. Generate the data yi = k * f(ti) + a2~ a ^ 2 ei where is the FARIMA sequence. The 

210 size of a is governed by the blurred signal-to-noise ratio (SNR) where 

SNR = 101og 10 (||K*/|| 2 /^ 2 ) 

2n The SNR is considered for three scenarios SNR = lOdB (high noise), 20dB (medium 

212 noise) or 30dB (low noise). 

213 5. Compute the default WaveD estimator which assumes three vanishing moments and 

214 starts the wavelet expansion at scale level jo = 3. The level dependent thresholds are 

215 computed, Xj = r]TjC n where rj = y/E, c n = a^J\ogn/n and Tj = (|Oj| -1 ^2 . \K[i] l^ 2 )^ 1 / 2 . 

216 The estimated noise level, a = MAD(yj t k)/0.6745 where MAD is the median abso- 

217 lute deviations and yj^ = (y,^j y k), the wavelet coefficients at the finest scale. The 

218 fine scale j\ = ji i is estimated using the default WaveD method which is based on 

219 Cavalier and Raimondo (2007). 

220 6. Compute the LRD WaveD estimator which uses the same hard thresholding wavelet 

221 expansion. However, now the fine scale level and level dependent thresholds are mod- 

222 ified. The fine scale level, ji = is estimated using (14) and the level dependent 

223 thresholds are estimated using Xj = C,T a> jC n where c n = a \/\ogn / 'n a and r a j calculated 

224 using (31) and (33). The smoothing parameter £ was tested for various different values. 
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225 Similar to the i.i.d. case seen in Johnstone et al. (2004), the default bound given in 

226 the theory with £ > 2v / 2a was much too conservative. Instead the performance was 

227 found to be much better with the choice £ = \/a. The presented results here are for 

228 the choices £ = y/a and £ = \^2a, the latter being more effective with low levels of 

229 a. In fact, the simulations were also conducted for £ = via, \f&a and y/8a. These 

230 bigger smoothing parameters were only competitive for the cases when a < 0.3 and 

231 are omitted from the tables for brevity. 

232 7. Compute the empirical version of the MSE where, 

2 i A/ I, 2 

MSE(fJ) = E f-f =^\\fi-f ■ 

i=l 



233 Jf..2. Numerical results 

234 The results of the procedure are outlined in Table 1 and Table 2. As stated earlier, the 

235 focus of the numerical study is the effect of the dependence structure so the DIP is fixed at 

236 v — 0.7. The most obvious fact is that overall the convergence rate tends to deteriorate as 

237 the level of dependence increases. This is consistent with the theoretical results in Section 3. 

238 The numerical results are not overall conclusive in favour of one method over the other for 

239 all noise levels. The main complication arises in the truncation of the wavelet expansion with 

240 the fine scale levels ji and j ai \. The typical fine scale levels (rounded to the nearest integer) 

241 are shown in Table 1 and Table 2 stated in parentheses for each case for each method. Due 

242 to the construction, j a l < ji i for all a G (0, 1] and this is reflected in their estimates shown 

243 in Table 1 and Table 2. As expected j a i decreases as a increases (stronger dependence) 

244 which is consistent with the theory. On the other hand, the naive i.i.d. estimator increases 

245 as a decreases which can be either detrimental or beneficial to estimation as discussed below. 

246 In some cases the earlier truncation in the LRD method is favourable such as the LIDAR 

247 and Cusp signals with a strong level of dependence which is shown in Figure 2. The addition 

248 of higher scales to the wavelet expansion in the i.i.d. method does contribute to capturing 

249 more of the cusp feature and the last two peaks of the LIDAR signal. However, it is paid 

250 at the price that higher scales include spurious effects from the LRD noise, resulting in a 

251 poor estimator. This is reflected in Table 1 where the MSE is smaller when j a> i < j\ with 

252 the exception at the LIDAR signal at 20dB and a = 0.6 and the Cusp signal at lOdB and 

253 a = 0.8. 

254 On the other hand, the earlier truncation of fine scale levels in the LRD method means 

255 that important features are not captured in the signal. This is seen in the Bumps and 
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Figure 3: Comparison of the LRD WaveD method against the default WaveD method. In both 
cases, n = 4096 = 2 12 . The LRD method truncates the finest scale earlier as noted. For the 
Bumps and Doppler signals, SNR £s 20dB. 



256 Doppler signals where the i.i.d. method tends to outperform the LRD method. In this 

257 situation the higher level scales in the expansion capture more features in the signal while not 

258 introducing the too much of the spurious LRD noise effects. A typical medium noise scenario 
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259 demonstrating this is shown for the Bumps and Doppler signals in Figure 3. Referring to 

260 the Doppler deconvolution in Figure 3 one can see that the spurious trends generated by 

261 the LRD noise are included in the default WaveD reconstruction, however since ji > j Q ,i, 

262 the higher frequences of the Doppler signal can be captured. The LRD method does not 

263 include the spurious trends but pays the price of losing the higher frequencies in the Doppler 

264 signal. A similar behaviour is present in the Bumps signal. This behaviour is reflected in 

265 the numerical study in Table 2 where the earlier truncation shows that the i.i.d. method 

266 outperforms the LRD method. However for the case of severely dependent noise at a = 0.2 

267 for the Doppler signal, the i.i.d. method loses its advantage and the addition of higher scales 

268 does not outperform the LRD method. 

269 Therefore the LRD estimation method presented here offers an easily implementable so- 

270 lution that is resistant to the effects of long memory. The method is attractive to the case 

271 where the underlying function / does not have a lot of transient high frequency behaviours 

272 where the higher scales of the wavelet transform are crucial. In the case where high frequen- 

273 cies are crucial to the signal, the established i.i.d. WaveD method is perhaps more favourable 

274 since the signal features at higher scales are more important than the spurious noise. 

275 5. Proofs 

First the probabilistic result of the numerical estimation of the highest permissible scale 
level is given. Define the frequency levels, 

M c = mm{ej > : \K[£}\ < £ a/2 e a (log l/e 2 ) 4/3 } (15) 
M d = min{e,£ > : \K[£]\ < £ a/2 s a (log l/e 2 ) 2/3 } . (16) 

276 Lemma 1. Define the event, B = nf^ {|£- a Z H [u;]| < |A[w]|/2j. Then 

P(B C ) < cM.exp {-(logl/£) 1+ «} =: fi n 

277 for some constants c, £ > 0. Further, define the event, M. = {M c < M < M^} then P(Ai c ) = 

278 <D(tt n ) and M c < M d < M x where M x = [2 jl \ . 

Proof of Lemma 1. First we prove the statement that, P{B C ) < Vt n . By definition 
of M d in (16) there exists a c G (0,1) such that for all u = 1,2, . . . , M d ; \K[u]\/2 > 
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2/3 

J- HUD, 

\K[u] 



fr/ 2 e a (logl/e) 2/3 . Thus, 



P(B c ) = P\n\\s a Z H [u]\> 

M d 

<^P(\e»Z B [ U ]\>\K[ U ]\/2 

u=l 
M d 

<^p(|^M|>^/ 2 (logl/, 2 ) 2/3 ) 

w=l 

From (32), Var ^Zh-[u;] J x and apply the tail inequality for Gaussian random vari- 

ables, let Z ~ A/"(0, 1), then there exists a constant C > such that, 

M d 

P(B C ) < J2 P {\ Z \ > cujl/2 (logl/£ 2 ) 2/3 ) 

w=l 

< CM d exp j~c 2 (logl/e) 4/3 | = (17) 

For Ai c the event can be written Ai c = {M > M d } U {M < M c } and start by considering 
the first scenario, 

P(M > M d ) < P(nff4 [\Y e [u]\ > c^Vlogl/e 2 }) 
<P(\Y e [M d ]\>Mfe a logl/e 2 ) 

< P(\Y e [M d }\ > M^ /2 e a \ogl/e 2 , B) + P(B C ) (18) 

279 Now under the event B with (2), \Y e [M d }\ < ~\K[M d ]\ < C\M d \~ u . Further, by definition of 

280 M d and (2) 

/ n-2/(2^+q) 

M d ^(e a (\ogl/ef 3 ) (19) 



281 which means that as n — > oo, M d v a ^ 2 = o(e a (logl/e 2 )). Hence, from (18) with (19) and 

282 (17), there is an N G Z + such that for all n > N, 

P{M > M d ) < P{M~ v - al2 > e a log 1/e 2 ) + P{B C ) = P{B C ) = 0{Q n ). (20) 
On the other hand, 

P(M <M C )<P (uf^ 1 {\Y e [uj}\ < cu a / 2 e a \ogl/s 2 Y 



Mr -l 



< Y,P(\YeM<cu a / 2 e a \ogl/e 2 ) 



0J=l 



283 Under the event B, \Y e [u] \ > \K[u]\/2 > cM~ u for all u = 1, 2, . . . , M c - 1. Similar to (19), 

284 by (15) and (2), 

M c ^(e a (logl/e 2 ) 4/3 Y 2/(2U+a) (21) 
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285 and consequently, as n — > oo, e a (\ogl/e 2 ) = o(M c 



-v-a/2\ 



Hence from (17), 



P(M < M c ) < P(B C ) = 0(Q n ) 



(22) 



286 Therefore, (20) with (22) yields the first result of the Lemma. The second result of the 

287 Lemma follows from (19) and (21). □ 

288 5.1. Maxiset Theorem 

As in similar previous works in Kulik and Raimondo (2009a); Johnstone et al. (2004) 
the proof of the main result imitates the same maxiset approach. Roughly speaking, this 
approach finds the 'maxiset' class of functions for a general hard thresholding wavelet esti- 
mator. This Maxiset theorem is stated here for easy reference. First, introduce the notation: 
ji will denote the measure such that for j G N, k G N, 



A* {(;>*)} = \Wj,u,a^j„ 



a p 2 J{ 2 



* Hp ' 



l q ,oo(m) = \ f , sup \ q fi {(j,k) : \/3 jtk \ > er, >|Ce A} < oo \ . 

I A>0 J 



290 Theorem 2 (Maxiset). Let p>0,0<q<p and {^j,k,j > — 1, k = 0, 1, . . . , 2 j ~ 1 } be a 

291 periodised wavelet basis o/jS^([0, 1]) and o~j is a positive sequence such that the heteroscedastic 

292 basis o-jtyj } k satisfies the Temlyakov property. Suppose that the index set A n is a set of pairs 

293 (j, k ) and c n is a deterministic sequence tending to zero with, 



sup /x(A n )c^ < oo. 

n 

294 If for any n and any pair (j, k) G A n we have, 

nhk - M 2p < C(a jCn ) 2 *>; P{\P hk - j)k \ > Ha jCn /2) < C(c 2 J> A a 

295 for some positive constants £ amd C then the wavelet estimator, 



satisfies the following for all positive integers n, 



E 



fn f 



< Cc p ~ q . 



if and only if, 



f G Zg,oo(/-0 and sup c q n p 

n>l 



< OO. 



(23) 



(24) 



(25) 



16 



The proof of the rate result in Theorem 1 will use the Maxiset Theorem by verifying the 
regularity conditions, (23), (24) and (25), then choosing q for the dense and sparse regions 
respectively with, 

^-rirr When »>(" + «/2)(pAr-l) (26) 
2s + 2v + a 



(2u + a)p — 2 

q = q s :=-^ ^ T when 1/tt - a/2 - u < s < (u + a/2)(p/n - 1) (27) 

2s + 2u + a 



298 5.#. Stochastic analysis of the estimated wavelet coefficients 

299 By definition, it is clear that the estimated wavelet coefficients have no bias. Consider 

300 now the covariance structure of the Zh process, 

Cov( Z^[u],Z^[e[) = EZh[cu}Zh[£} = E I e~ 2 ™ x dB H (x) [ e 2wi£y dB H (y). 

301 To evaluate this, appeal to Theorem 2 of Wang (1996) which uses a representation of the 

302 fractional Gaussian noise process via a Wavelet- Vaguelette Decomposition (WVD) 

dB H (x) = ^2^j,kUj,k(x) dx, 

303 where is a white noise process and Uj k is a set of vaguelette basis functions defined with, 

where A = is an elliptic operator and (pj^ ■ K — > K is a set of orthogonal wavelet 
functions. Using this representation of fractional Gaussian noise we can write, 

i,k j',k' ^ R ^ R 

= E ( 28 ) 

304 The operator (— A) - ^ -1 ' 2 )/ 2 for G (1/2, 1) is known from the theory of singular integrals 

305 as the Reisz Potential (see for example Stein, 1970, Chapter V) and has the representation, 

(-A) - 2 f(x) = ^ ^ / f{z)\z-x\ H - z ' 2 dz. (29) 



0F2*-ir (f - i) 

For our purposes its behaviour in the Fourier domain has been evaluated by Samko, Kilbas, and Marichev 
(1993). Indeed, apply Theorem 12.2 of Samko et al. (1993) with (29) then for any / G J£?i(R), 

^ ( _ A) i/4-h/ 2/m = 7 HM V2-* (30) 
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306 From (28) and (30), it would be desirable to use a wavelet function ip G Jzfi(IR) that has a 

307 simple behaviour in the Fourier domain. Naturally, a suitable choice is the Meyer wavelet, 

308 (p = ip, since it is bandlimited (see Section 2.1) and makes the calculations easier. Therefore 

309 we have, 

Cov [Zh^IZh^]) = \ut\ 1 ' 2 - H Y^h{u'^M- ( 31 ) 

Consider an arbitrary u G Z and consider the possible values of j such that uj G Dj. 
The summation sets will have a non empty intersection, Dj fl 3j> ^ 0, if and only if j' G 
{j — l,j,j + 1}. Therefore the summands in (31) will be nonzero for, at most, three different 
values of j. The Meyer wavelet also is bounded with, < 1. Therefore we can crudely 
bound the magnitude of the covariance with, 

2-7-1 

EE 2 '- 

jei. fc=o 



Cov[Z H [u],Z H [£}) =\ui\ 1 ' 2 - H 



-j 2 1 ~ : >-rri(£—u))k 



(32) 



Thus we are in a position to bound the variance of the estimated wavelet coefficients, 



Var = , 2 «Var I £ 



.2a 



EE 



(33) 



w&Sjt&Sj K[£)K[u] 

To gain insight to the overall asymptotic structure, the variance of the coefficients needs to 
be bounded. Use (32) and that the cardinality of Dj, |Dj| = 2 J+1 , 



^ 3 E 



i/2-h 



#0 



11/2- 



JGID),- 



K[u\ 



<3|%|2-^ 



wen 



ATM 



1-2H 



< 



«E 



< 6 sup Ixl 1 " 2 ^ sup \K[y}\-' 2 I \^{z)\ 2 dz 



1-2H 



x&. 



< C2~ j{l ~ a - 2u) II^H 2 ., 
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310 where the last two lines follow by Parseval and Plancherels identities and condition (2). 

311 Thus, the asymptotic behaviour of the variance of the wavelet coefficients at scale j are 

312 bounded with, 

Var = e 2 ^ < Ce 2 ^— 2 ^ < Co 2 ^ 

313 where <jj il/|Q , and c n are defined in (10) and (11). Since (3j^ is Gaussian, then from the variance 

314 bound it follows that, 

E 



2p 



Let £ > ^Aa{p V 2) and Z ~ A/"(0, 1), then from the tail inequality for Gaussian random 
variables. 



P 



/3j,k — Pj,k 



> &J^Cn\ =2 p( Z > ^ V/I ^ 

2 J V " 2 

^ 2V2 / e^ogn 

— ^=^= 1 A 

^V7rlogn t 4 
<Qn-T<C(c^Ac^). 

315 This verifies the wavelet coefficient conditions in (24). 

316 5.3. Temlyakov Property and resolution tuning 

317 Recall A n = {(j, k) : — 1 < j < k = 0, 1, . . . , 2 3 ' — 1}. Consider the set of scales A* = 
31s {j G Z : — 1 < j < j a ,i,ja,i € Z + }. Then from Johnstone et al. (2004, Appendices A.l & 

319 B.2), the heteroskedastic basis {<x,-, \l/j j fc(-)} satisfies the Temlyakov property as soon as, 

Vo) < C sup (2V|) and ^ C SU P (2 Jp/2 <) . 

This property is satisfied in our framework with cr, = ct^q < C2~ ] / 2<yl ~ a ~ 2u \ Now verify 
the resolution tuning condition (23). 

/i(A n ) = y°lu,c? j{p,2 ~ l) II* lip < C2^ a+2 ^ 2 . 

320 Using the choice, 2- ?Q > 1 x {n a / \ogn) 1 ^ a+2u ^ yields /i(A„)c^ = Oil). Consequently, (23) is 

321 verified. 

322 5.4. Besov Embedding 

323 One needs to find a Besov scale such that £>f r C i g oo and that the maxiset condition (25) 

324 holds. As shown in Johnstone et al. (2004), the problem can be simplified by considering 



19 



325 the the set of functions l g (p) C l q<00 (p) where, 



P 

where Aj is a set of cardinality proportional to 2 J . Since ||c"j^-,A;||p = (J V j2 ■* then / e / ? (/^) 



327 if 



^ 2 2(p-2-(p- ff )(l-a-2„)) £ = ^ 2 i<(-+2 )(i- 1 )+2"iJ ^ 1^(9 < ^ 

j>0 fc=0 j>0 k=0 

This condition is true for / G £>g 9 with the choice 5 — [y + f)(- — 1)- Then depending on 
whether we are in the dense or sparse case the levels s, ir and r are determined such that 
B% r C B\ 5 . There exists two Besov embeddings given by, 



B a V)T C B£r when < p < ir and s > 7; (34) 
B;_ r C^ when 7r<pands-l/7r = 7-l/p. (35) 

328 The dense phase. Choose the level of q = qd where, 

p{2v + a) , . . . . . 

: = 1 — : — when s > [y + a/2){p/n- 1). 

329 Then find the levels s, it and r such that B^ r C where s > (z/ + a/2)(p/n — 1). 

330 By definition, one has, s > (a/2 + v) (p/ft — 1). Eliminate p by substituting p = (2s + 

331 2v + a)<?d/ (2z/ + a) yields, n > qd- So we need to prove s > 5 = (2v + a) (p/2q d — 1/2) to 

332 be able to use (34). However, by definition of qd we have, 5 = (v + a/2) (p/qa — 1) = s > 

333 by assumption. 

334 The sparse phase. Choose the level of q = q s where, 



(2v + a)p — 2 

q s := — 7, when 1/7T - a/2 — v < s < (u + a/2)(p/7r - 1). 

2s + 2v + a — - 



To ensure the inequalities in the above equation are valid, it requires that p > 2/{2v + a). 
Then 5 = {2v + a) (p/2q s — 1/2) and we have, 

5={2v + a) (p/2q s -l/2) 

(2u + a)(sp — p/n + 1) 
(2v + a)p-2 ' 

335 Consider the scenario when n > q s and use embedding (34). This requires that s > 5 = 

336 (2v + a)(sp — p/n + l)/((2i> + ct)p — 2), or equivalently, s < [y + a/2)(p/n — 1). It is also 

337 needed that 5 > 0, which implies that either (i) p > 2/{2v + a) and s>l/7r — or (ii) 
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338 p < 2j(2v + a) and s < 1/tt — 1/p. The (ii) scenario is impossible since it is assumed that 

339 p > 1 and s > l/7r which contradicts s < l/n — l/p. The condition p > 2/(2zv + a) is verified 

340 since by assumption in the sparse phase, v > 1 — a/2 and p > 1. Then (i) implies that, 

341 s > 1 / 7r — v — a/2. 

342 Consider now the scenario when tt < q s by definining, 

s' = s - 1/tt + 1/g. (36) 

343 Then use the embedding (35). Indeed, if we solve (36) with s' = 5, then q = q s and the 

344 embedding of (35) applies. 

To apply Theorem 2, (25) needs to be verified. Therefore we need to find a 5 > such 
that for any / G B * r , (25) is satisfied. 



r q-p 



345 The above is bounded uniformly in n if we choose 5 = l/2(2u + a)(l — q/p). Now we need 

346 to find s, n such that B^ r C iB* r . 

Consider the first case tt > p. This case cannot occur in the sparse phase due to (13) 
and the assumption that s is positive. In the dense phase, use embedding (34) with 7 = 5 
and q = qa- Therefore, (34) holds if s > \/2(2v-\- a)(l — qd/p)- This implies, 

s>l/2(2u + a)(l-q d /p) 

{2v + a)s 
~ 2s + 2v + a 

347 which always holds under the assumption that s > 0. 

348 Now consider the dense case when tt < p. In this scenario use embedding (35) by defining 

349 s — 1/tt = 7 — 1/p which ensures B^ r C BJ r . Then complete the embedding using (34) 

350 (namely, BJ r C B, 5 ) which requires 7 > 5 with q = q d or equivalently after rearrangement, 

351 2s7+(2v+a)(l/p-l/7r) > 0. The left hand side is greater than (s-l/7r)(p/7r-l)(2i/+a) > 

352 when s > (v + a/2)(p/n — 1) (which is true in the dense phase). 

353 The last case to consider is the sparse case when n < p. Again introduce a new Besov 

354 scale 7 defined with, s — 1/tt = 7 — 1/p and apply a similar argument to above which requires 

355 that, 7 > 5 with q = q s . This is satisfied if s > 1/tt, which always holds. 

356 5. 5. Proof of Theorem 1 

357 The proof of the theorem is an application of Theorem 2 with the choice of o~j = cr J;i/ a and 

358 c n and £ defined in Section 3.1 and the arguments in Section 5.2, Section 5.3 and Section 5.4. 
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359 The dense regime rate result of (12) is derived with the choice of q = qd in (26) and the Besov 

360 embedding argument in Section 5.4. Similarly, the sparse rate regime in (13) is derived with 

361 the Besov embedding result in Section 5.4 with the choice of q = q s in (27). 
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0.0248 


^4) 


0.0299 




0.0393 


^4) 




£ = V2^ 


0.0151 


'5) 


0.0164 


(5) 


0.0262 


'4) 


0.0305 


'4) 


0.0382 


'4) 


LIDAR 


i.i.d. 


0.0049 


6) 


0.0049 


I") 


0.0060 


7) 


0.0111 


J) 


0.0230 


7) 


30dB 




0.0041 


6) 


0.0044 




0.0057 


6) 


0.0088 


[6) 


0.0103 


'■A 




e = v 7 ^ 


0.0053 




0.0054 


(6) 


0.0059 


6) 


0.0073 


6) 


0.0096 


5) 



Table 1: MSE for the Bumps and Cusp signals with n = 4096 and M = 1024, the smoothing 
parameter for the LRD WaveD method is £ = y/a, the i.i.d. method uses the default r] = \/6. 
The typical estimated fine scale levels are shown in parentheses for each case. 
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DIP = 0.7 



Signal 


Method 


1 




0.8 




0.6 




0.4 




0.2 




Bumps 


i.i 


d. 


0.7657 


[A) 


0.7717 


[A) 


0.7904 


;4) 


0.8277 


(4) 


0.9475 


[5) 


lOdB 






0.7615 


4) 


0.7705 


4) 


0.9376 


3) 


0.9511 


13) 


0.9831 


3) 






\/2a 


0.7687 


4) 


0.7768 


4) 


0.9380 


3) 


0.9517 


13) 


0.9818 


3) 


Bumps 


i.i 


d. 


0.5384 


5) 


0.5405 


5) 


0.2905 


6) 


0.2702 


16) 


0.3165 


6) 


20dB 






0.5374 


■5) 


0.5405 


5) 


0.5469 


5) 


0.7583 


14) 


0.7661 


4) 






\/2a 


0.5391 


5) 


0.5417 


5) 


0.5473 


5) 


0.7583 


(4) 


0.7660 


4) 


Bumps 


i.i 


d. 


0.0793 


7) 


0.0807 


7) 


0.0838 


7) 


0.0897 


7) 


0.0988 


7) 


30dB 






0.0777 


7) 


0.2074 


6) 


0.2093 


6) 


0.21246 


(6) 


0.3141 


6) 






\/2^ 


0.0811 


7) 


0.2080 


6) 


0.2094 


6) 


0.2118 


16) 


0.3133 


6) 


Doppler 


i.i 


d. 


0.0278 


'5) 


0.0293 


'5) 


0.0369 


;5) 


0.0631 


(5) 


0.1199 


'5) 


lOdB 


e = 




0.0263 


■5) 


0.0472 


4) 


0.0570 


4) 


0.0689 


1 4) 


0.1203 


3) 




e = 


\/2a 


0.0292 


5) 


0.0481 


4) 


0.0568 


4) 


0.0666 


14) 


0.1196 


3) 


Doppler 


i.i 


d. 


0.0103 


'6) 


0.0106 


'6) 


0.0122 


'6) 


0.0183 


'6) 


0.0455 


7) 


20dB 


e = 




0.0096 


[6) 


0.0104 


[6) 


0.0200 


'5) 


0.0257 




0.0304 


'5) 




e = 


\/2a 


0.0107 


6) 


0.0110 


6) 


0.0197 


5) 


0.0248 


15) 


0.0284 


5) 


Doppler 


i.i 


d. 


0.0029 


7) 


0.0030 


7) 


0.0032 


7) 


0.0043 


'8) 


0.0075 


;s) 


30dB 


e = 


s/a 


0.0027 


7) 


0.0029 


7) 


0.0036 


7) 


0.0055 


'7) 


0.0094 


7) 




e = 


\f2a 


0.0031 


7) 


0.0031 


7) 


0.0033 


7) 


0.0039 


7) 


0.0066 


7) 



Table 2: MSE for the Doppler and LIDAR signals with n = 4096 and M = 1024, the 
smoothing parameter for the LRD WaveD method is £ = \fa, the i.i.d. method uses the 
default r] = a/6. The typical estimated fine scale levels are shown in parentheses for each 
case. 
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